Sales Analysis and Visualization

¶

Outline¶

This notebook provides an analysis on retail and warehouse sales. The data contains 307,645 sales records and 9 columns, including retail sales, warehouse sales, name of suppliers, and item descriptions.

The cleaning and formatting process includes:

  • filling missing data
  • checking fior duplicate records
  • dropping highly correlated columns
  • formatting the date

The analysis answers a variety of business questions such as:

  • Which item sold the most in quantity?
  • What is the overall sales trend per month and per year?
  • Which suppliers had the most profit in retail and warehouse sales?
  • Which item types had the most profit in retail and warehouse sales?
  • Which item had the most profit in retail and warehouse sales?

Most of the data visualizations are done using Plotly express because of their interactive capabilities. Seaborn is also used to create heatmaps.


What is the source of the data?¶

This data was acquired at: https://catalog.data.gov/dataset/warehouse-and-retail-sales¶

Import libraries¶

In [7]:
import pandas as pd
import os
import seaborn as sns
In [8]:
pwd = os.getcwd()
In [9]:
sales_data = pd.read_csv(pwd + "\\Warehouse_and_Retail_Sales.csv")

What is the size of the dataset?¶

In [10]:
shape = sales_data.shape
print(f"There are {shape[0]} rows and {shape[1]} columns.")
print(f"{shape[0]*shape[1]} total elements.")
There are 307645 rows and 9 columns.
2768805 total elements.

What are the columns of the dataset?¶

In [11]:
list(sales_data.columns)
Out[11]:
['YEAR',
 'MONTH',
 'SUPPLIER',
 'ITEM CODE',
 'ITEM DESCRIPTION',
 'ITEM TYPE',
 'RETAIL SALES',
 'RETAIL TRANSFERS',
 'WAREHOUSE SALES']

What are the data types of the column variables?¶

In [12]:
sales_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307645 entries, 0 to 307644
Data columns (total 9 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   YEAR              307645 non-null  int64  
 1   MONTH             307645 non-null  int64  
 2   SUPPLIER          307478 non-null  object 
 3   ITEM CODE         307645 non-null  object 
 4   ITEM DESCRIPTION  307645 non-null  object 
 5   ITEM TYPE         307644 non-null  object 
 6   RETAIL SALES      307642 non-null  float64
 7   RETAIL TRANSFERS  307645 non-null  float64
 8   WAREHOUSE SALES   307645 non-null  float64
dtypes: float64(3), int64(2), object(4)
memory usage: 21.1+ MB

Is there missing data in the dataset?¶

In [13]:
sales_data.isnull().sum()
Out[13]:
YEAR                  0
MONTH                 0
SUPPLIER            167
ITEM CODE             0
ITEM DESCRIPTION      0
ITEM TYPE             1
RETAIL SALES          3
RETAIL TRANSFERS      0
WAREHOUSE SALES       0
dtype: int64
In [14]:
print(f"There is {round(sales_data.isnull().sum().sum()/2768805,4)}% missing data in this dataset.")
There is 0.0001% missing data in this dataset.

Create a filter that only shows the rows that have missing values¶

Missing values for "SUPPLIER"¶

In [15]:
supplier_nan = pd.isnull(sales_data["SUPPLIER"])
sales_data[supplier_nan]
Out[15]:
YEAR MONTH SUPPLIER ITEM CODE ITEM DESCRIPTION ITEM TYPE RETAIL SALES RETAIL TRANSFERS WAREHOUSE SALES
107 2020 1 NaN 107 JIGGER MEASURE SHOT GLASS STR_SUPPLIES 14.69 18.0 0.0
189 2020 1 NaN 113 BARTENDERS BLACK BOOK STR_SUPPLIES 0.40 0.0 0.0
233 2020 1 NaN 115 PLASTIC SHOT GLASS PACK STR_SUPPLIES 5.71 6.0 0.0
254 2020 1 NaN 117 WHISKEY TASTING JOURNAL STR_SUPPLIES 0.08 0.0 0.0
263 2020 1 NaN 118 PLASTIC WINE GLASS PACK STR_SUPPLIES 7.40 10.0 0.0
... ... ... ... ... ... ... ... ... ...
307414 2020 9 NaN 3 COUPON NON-ALCOHOL NaN 0.0 0.0
307443 2020 9 NaN 4 RMS ITEM NON-ALCOHOL NaN 0.0 0.0
307493 2020 9 NaN 107 JIGGER MEASURE SHOT GLASS STR_SUPPLIES 8.62 16.0 0.0
307601 2020 9 NaN WC WINE CREDIT REF 0.00 0.0 -70.0
307605 2020 9 NaN 115 PLASTIC SHOT GLASS PACK STR_SUPPLIES 4.21 6.0 0.0

167 rows × 9 columns

Missing values for "ITEM TYPE"¶

In [16]:
item_type_nan = pd.isnull(sales_data["ITEM TYPE"])
sales_data[item_type_nan]
Out[16]:
YEAR MONTH SUPPLIER ITEM CODE ITEM DESCRIPTION ITEM TYPE RETAIL SALES RETAIL TRANSFERS WAREHOUSE SALES
106749 2017 10 REPUBLIC NATIONAL DISTRIBUTING CO 347939 FONTANAFREDDA BAROLO SILVER LABEL 750 ML NaN 0.0 0.0 1.0

Missing values for "RETAIL SALES"¶

In [17]:
retail_nan = pd.isnull(sales_data["RETAIL SALES"])
sales_data[retail_nan]
Out[17]:
YEAR MONTH SUPPLIER ITEM CODE ITEM DESCRIPTION ITEM TYPE RETAIL SALES RETAIL TRANSFERS WAREHOUSE SALES
18396 2020 7 NaN 4 RMS ITEM NON-ALCOHOL NaN 0.0 0.0
307414 2020 9 NaN 3 COUPON NON-ALCOHOL NaN 0.0 0.0
307443 2020 9 NaN 4 RMS ITEM NON-ALCOHOL NaN 0.0 0.0

Fill missing data, impute mode values for categorical data and mean values for numerical data¶

In [18]:
sales_data["SUPPLIER"].fillna(sales_data["SUPPLIER"].mode().iloc[0], inplace=True)

sales_data["ITEM TYPE"].fillna(sales_data["ITEM TYPE"].mode().iloc[0], inplace=True)

sales_data["RETAIL SALES"].fillna(sales_data["RETAIL SALES"].mean(), inplace=True)

sales_data.isnull().sum()
Out[18]:
YEAR                0
MONTH               0
SUPPLIER            0
ITEM CODE           0
ITEM DESCRIPTION    0
ITEM TYPE           0
RETAIL SALES        0
RETAIL TRANSFERS    0
WAREHOUSE SALES     0
dtype: int64

Check on rows that previously had missing data to ensure the imputation process worked¶

SUPPLIER imputation check¶

In [19]:
sales_data[supplier_nan]
Out[19]:
YEAR MONTH SUPPLIER ITEM CODE ITEM DESCRIPTION ITEM TYPE RETAIL SALES RETAIL TRANSFERS WAREHOUSE SALES
107 2020 1 REPUBLIC NATIONAL DISTRIBUTING CO 107 JIGGER MEASURE SHOT GLASS STR_SUPPLIES 14.690000 18.0 0.0
189 2020 1 REPUBLIC NATIONAL DISTRIBUTING CO 113 BARTENDERS BLACK BOOK STR_SUPPLIES 0.400000 0.0 0.0
233 2020 1 REPUBLIC NATIONAL DISTRIBUTING CO 115 PLASTIC SHOT GLASS PACK STR_SUPPLIES 5.710000 6.0 0.0
254 2020 1 REPUBLIC NATIONAL DISTRIBUTING CO 117 WHISKEY TASTING JOURNAL STR_SUPPLIES 0.080000 0.0 0.0
263 2020 1 REPUBLIC NATIONAL DISTRIBUTING CO 118 PLASTIC WINE GLASS PACK STR_SUPPLIES 7.400000 10.0 0.0
... ... ... ... ... ... ... ... ... ...
307414 2020 9 REPUBLIC NATIONAL DISTRIBUTING CO 3 COUPON NON-ALCOHOL 7.024071 0.0 0.0
307443 2020 9 REPUBLIC NATIONAL DISTRIBUTING CO 4 RMS ITEM NON-ALCOHOL 7.024071 0.0 0.0
307493 2020 9 REPUBLIC NATIONAL DISTRIBUTING CO 107 JIGGER MEASURE SHOT GLASS STR_SUPPLIES 8.620000 16.0 0.0
307601 2020 9 REPUBLIC NATIONAL DISTRIBUTING CO WC WINE CREDIT REF 0.000000 0.0 -70.0
307605 2020 9 REPUBLIC NATIONAL DISTRIBUTING CO 115 PLASTIC SHOT GLASS PACK STR_SUPPLIES 4.210000 6.0 0.0

167 rows × 9 columns

RETAIL SALES imputation check¶

In [20]:
sales_data[retail_nan]
Out[20]:
YEAR MONTH SUPPLIER ITEM CODE ITEM DESCRIPTION ITEM TYPE RETAIL SALES RETAIL TRANSFERS WAREHOUSE SALES
18396 2020 7 REPUBLIC NATIONAL DISTRIBUTING CO 4 RMS ITEM NON-ALCOHOL 7.024071 0.0 0.0
307414 2020 9 REPUBLIC NATIONAL DISTRIBUTING CO 3 COUPON NON-ALCOHOL 7.024071 0.0 0.0
307443 2020 9 REPUBLIC NATIONAL DISTRIBUTING CO 4 RMS ITEM NON-ALCOHOL 7.024071 0.0 0.0

ITEM TYPE imputation check¶

In [21]:
sales_data[item_type_nan]
Out[21]:
YEAR MONTH SUPPLIER ITEM CODE ITEM DESCRIPTION ITEM TYPE RETAIL SALES RETAIL TRANSFERS WAREHOUSE SALES
106749 2017 10 REPUBLIC NATIONAL DISTRIBUTING CO 347939 FONTANAFREDDA BAROLO SILVER LABEL 750 ML WINE 0.0 0.0 1.0

Duplicates (no duplicate records found)¶

In [22]:
sales_data[sales_data.duplicated()]
Out[22]:
YEAR MONTH SUPPLIER ITEM CODE ITEM DESCRIPTION ITEM TYPE RETAIL SALES RETAIL TRANSFERS WAREHOUSE SALES

Exploring Feature Correlations: Heatmap Analysis¶

In [23]:
sns.heatmap(sales_data.corr(),annot = True)
C:\Users\dchap\AppData\Local\Temp\ipykernel_24356\1292874986.py:1: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
  sns.heatmap(sales_data.corr(),annot = True)
Out[23]:
<AxesSubplot: >

Retail Transfers and Retail Sales are highly correlated (96%), let's drop Retail Transfers since we're more curious about the sales¶

In [24]:
sales_data.drop(columns= "RETAIL TRANSFERS", inplace = True)
sales_data.head()
Out[24]:
YEAR MONTH SUPPLIER ITEM CODE ITEM DESCRIPTION ITEM TYPE RETAIL SALES WAREHOUSE SALES
0 2020 1 REPUBLIC NATIONAL DISTRIBUTING CO 100009 BOOTLEG RED - 750ML WINE 0.00 2.0
1 2020 1 PWSWN INC 100024 MOMENT DE PLAISIR - 750ML WINE 0.00 4.0
2 2020 1 RELIABLE CHURCHILL LLLP 1001 S SMITH ORGANIC PEAR CIDER - 18.7OZ BEER 0.00 1.0
3 2020 1 LANTERNA DISTRIBUTORS INC 100145 SCHLINK HAUS KABINETT - 750ML WINE 0.00 1.0
4 2020 1 DIONYSOS IMPORTS INC 100293 SANTORINI GAVALA WHITE - 750ML WINE 0.82 0.0

Data Formatting¶

Formatting the date¶

In [25]:
#Format the new date in new column
sales_data["DATE"] = pd.to_datetime(sales_data["MONTH"].astype(str) + ' ' + sales_data["YEAR"].astype(str), format='%m %Y')

#Drop old date columns
sales_data.drop(columns = ["YEAR", "MONTH"], inplace = True)

#Reorder columns
sales_data = sales_data.loc[:,["DATE", "SUPPLIER", "ITEM CODE", "ITEM DESCRIPTION", "ITEM TYPE", "RETAIL SALES", "WAREHOUSE SALES"]]
sales_data.head()
Out[25]:
DATE SUPPLIER ITEM CODE ITEM DESCRIPTION ITEM TYPE RETAIL SALES WAREHOUSE SALES
0 2020-01-01 REPUBLIC NATIONAL DISTRIBUTING CO 100009 BOOTLEG RED - 750ML WINE 0.00 2.0
1 2020-01-01 PWSWN INC 100024 MOMENT DE PLAISIR - 750ML WINE 0.00 4.0
2 2020-01-01 RELIABLE CHURCHILL LLLP 1001 S SMITH ORGANIC PEAR CIDER - 18.7OZ BEER 0.00 1.0
3 2020-01-01 LANTERNA DISTRIBUTORS INC 100145 SCHLINK HAUS KABINETT - 750ML WINE 0.00 1.0
4 2020-01-01 DIONYSOS IMPORTS INC 100293 SANTORINI GAVALA WHITE - 750ML WINE 0.82 0.0

Sorting Data by date¶

In [26]:
sd_sorted_date = sales_data.sort_values("DATE")
sales_data  = sd_sorted_date
sales_data.head()
Out[26]:
DATE SUPPLIER ITEM CODE ITEM DESCRIPTION ITEM TYPE RETAIL SALES WAREHOUSE SALES
54813 2017-06-01 VINTUS LLC 43919 QUINTA DO NOVAL TAWNY 10YR - 750ML WINE 0.32 0.0
51848 2017-06-01 DANGEROUS WINE GROUP LLC 335314 PALISSADE ROSE - 750ML WINE 0.24 0.0
51847 2017-06-01 PWSWN INC 335309 MARC CELLARS CAB 12 - 750ML WINE 0.00 3.0
51846 2017-06-01 PWSWN INC 335289 PASO ROBLES PETITE COTE - 750ML WINE 0.00 6.0
51845 2017-06-01 A VINTNERS SELECTIONS 335285 SATELLITE S/BLC - 750ML WINE 0.00 6.0

Univariate Analysis¶

In [27]:
import plotly.express as px

Visualization of most occuring suppliers descending from most to least¶

In [28]:
fig1 = px.histogram(sales_data, x="SUPPLIER").update_xaxes(categoryorder='total descending')
fig1.show()

Who are the top 5 occurring suppliers?¶

In [29]:
print("Top 5 suppliers who sold the most: ")
sales_data["SUPPLIER"].value_counts()[0:5]
Top 5 suppliers who sold the most: 
Out[29]:
REPUBLIC NATIONAL DISTRIBUTING CO       21162
LEGENDS LTD                             13634
SOUTHERN GLAZERS WINE AND SPIRITS       11720
E & J GALLO WINERY                      10816
THE COUNTRY VINTNER, LLC DBA WINEBOW    10669
Name: SUPPLIER, dtype: int64

Bottom 5 occuring suppliers?¶

In [30]:
print("Top 5 suppliers who sold the least: ")
sales_data["SUPPLIER"].value_counts()[len(sales_data["SUPPLIER"].value_counts())-5:len(sales_data["SUPPLIER"].value_counts())]
Top 5 suppliers who sold the least: 
Out[30]:
FIOR DI SOLE LLC            1
BARREL ONE INC              1
ONE EIGHT DISTILLING LLC    1
CALVERT BREWING COMPANY     1
G GAUTHIER CELLARS INC      1
Name: SUPPLIER, dtype: int64

Which item sold the most in quantity?¶

In [31]:
fig2 = px.histogram(sales_data, x="ITEM DESCRIPTION").update_xaxes(categoryorder='total descending')
fig2.update_layout(bargap=.5)
fig2.show()

Top 5 items sold¶

In [32]:
print("Top 5 items sold: ")
sales_data["ITEM DESCRIPTION"].value_counts()[0:5]
Top 5 items sold: 
Out[32]:
BURGANS ALBARINO - 750ML                     44
SANTA JULIA TORRONTES SUSTAINABLE - 750ML    39
LA VIELLE FERME CDV RED - 750ML              35
LINGANORE BLACK RAVEN - 750ML                34
MCCLINTOCK HERITAGE WHITE WHISKEY - 750ML    33
Name: ITEM DESCRIPTION, dtype: int64

Bottom 5 items sold¶

In [33]:
print("Bottom 5 items sold: ")
sales_data["ITEM DESCRIPTION"].value_counts()[len(sales_data["ITEM DESCRIPTION"].value_counts())-5:len(sales_data["ITEM DESCRIPTION"].value_counts())]
Bottom 5 items sold: 
Out[33]:
BLUE MOON MANGO WHITE NR                1
YALUMBA THE SCRIBBLER - 750ML           1
UNION STEADY EDDIS WHEAT IPA 1/6 KEG    1
PERENNIAL SOUTHSIDE BLONDE - 1/2K       1
PLANET OREGON P/NOIR - 750ML            1
Name: ITEM DESCRIPTION, dtype: int64

Which item type sold the most in quantity?¶

In [34]:
fig3 = px.histogram(sales_data, x="ITEM TYPE").update_xaxes(categoryorder='total descending')
fig3.update_layout(bargap=.5)
fig3.show()

Top 4 item types sold¶

In [35]:
print("Top 4 item types sold: ")
sales_data["ITEM TYPE"].value_counts()[0:4]
Top 4 item types sold: 
Out[35]:
WINE      187641
LIQUOR     64910
BEER       42413
KEGS       10146
Name: ITEM TYPE, dtype: int64

Bottom 4 item types sold¶

In [36]:
print("Bottom 4 items least sold: ")
sales_data["ITEM TYPE"].value_counts()[len(sales_data["ITEM TYPE"].value_counts())-4:len(sales_data["ITEM TYPE"].value_counts())]
Bottom 4 items least sold: 
Out[36]:
NON-ALCOHOL     1908
STR_SUPPLIES     405
REF              127
DUNNAGE           95
Name: ITEM TYPE, dtype: int64

Statistics on retail sales and warehouse sales¶

In [37]:
sales_data["RETAIL SALES"].describe()
Out[37]:
count    307645.000000
mean          7.024071
std          30.986087
min          -6.490000
25%           0.000000
50%           0.320000
75%           3.270000
max        2739.000000
Name: RETAIL SALES, dtype: float64
In [38]:
sales_data["WAREHOUSE SALES"].describe()
Out[38]:
count    307645.000000
mean         25.294597
std         249.916798
min       -7800.000000
25%           0.000000
50%           1.000000
75%           5.000000
max       18317.000000
Name: WAREHOUSE SALES, dtype: float64

Bivariate Analysis¶

What is the overall sales trend per month?¶

Group the retail and warehouse sales by month and sum them¶

In [39]:
sales_per_month = sales_data.groupby(pd.Grouper(key='DATE', freq='M')).sum()
retail_per_month = sales_per_month["RETAIL SALES"]

ware_per_month = sales_per_month["WAREHOUSE SALES"]
month = sales_per_month.index
C:\Users\dchap\AppData\Local\Temp\ipykernel_24356\3842341077.py:1: FutureWarning:

The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.

Retail/Warehouse Visualization¶

In [40]:
fig2 = px.line(sales_per_month, x=month, y=[retail_per_month,ware_per_month])

fig2.update_layout(width=1000, title={
    "text": "Retail/Warehouse Sales per Month (4 Years)",
    "x": 0.5,
    "y": 0.95,
    "xanchor": "center",
    "yanchor": "top"
}, 
    xaxis_title = "Month",
    yaxis_title="Total Sales"
                  )
fig2.show()

What is the overall sales trend per year?¶

Group the retail and warehouse sales by year and sum them¶

In [41]:
sales_per_year = sales_data.groupby(pd.Grouper(key='DATE', freq='Y')).sum()
retail_per_year = sales_per_year["RETAIL SALES"]

ware_per_year = sales_per_year["WAREHOUSE SALES"]
year = sales_per_year.index
C:\Users\dchap\AppData\Local\Temp\ipykernel_24356\2448855587.py:1: FutureWarning:

The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.

Retail/Warehouse Visualization¶

In [70]:
fig3 = px.line(sales_per_year, x=year, y=[retail_per_year,ware_per_year])

fig3.update_layout(width=1000, title={
    "text": "Retail/Warehouse Sales Per Year",
    "x": 0.5,
    "y": 0.95,
    "xanchor": "center",
    "yanchor": "top"
}, 
    xaxis_title = "Time",
    yaxis_title="Total Sales"
                  )
fig3.show()

Which suppliers had the most profit in retail sales?¶

In [43]:
rsupplier_sales_data = sales_data.groupby("SUPPLIER")["RETAIL SALES"].sum().sort_values(ascending = False)
wsupplier_sales_data = sales_data.groupby("SUPPLIER")["WAREHOUSE SALES"].sum().sort_values(ascending = False)

rsupplier = rsupplier_sales_data.index
rsupplier_sales = rsupplier_sales_data.values

wsupplier = wsupplier_sales_data.index
wsupplier_sales = wsupplier_sales_data.values

#There are numerous suppliers so let's only focus on the top 20%
percent = .2
cutoff = int(round(percent * len(rsupplier_sales_data),0))
rsupplier = rsupplier[0:cutoff]
rsupplier_sales = rsupplier_sales[0:cutoff]

wsupplier = wsupplier[0:cutoff]
wsupplier_sales = wsupplier_sales[0:cutoff]
In [44]:
fig4 = px.bar(x=rsupplier, y=rsupplier_sales)

fig4.update_layout(
    title='Supplier Retail Sales (Top 20%)',
    xaxis_title='Suppliers',
    yaxis_title='Retail Sales'
)

fig4.show()
In [45]:
print(f'Top 5 Suppliers by Retail Sales: \n{rsupplier_sales_data[0:5]}')
Top 5 Suppliers by Retail Sales: 
SUPPLIER
E & J GALLO WINERY          166170.53
DIAGEO NORTH AMERICA INC    145343.20
CONSTELLATION BRANDS        131664.79
ANHEUSER BUSCH INC          109960.82
JIM BEAM BRANDS CO           96164.04
Name: RETAIL SALES, dtype: float64
In [46]:
print(f'Bottom 5 Suppliers by Retail Sales: \n{rsupplier_sales_data[len(rsupplier_sales_data)-5:len(rsupplier_sales_data)]}')
Bottom 5 Suppliers by Retail Sales: 
SUPPLIER
RSL HOLDINGS INC               0.0
CALVERT BREWING COMPANY        0.0
BREWDOG BREWING COMPANY LLC    0.0
MARQUEE SELECTIONS LLC         0.0
MARKERS EDGE LLC               0.0
Name: RETAIL SALES, dtype: float64

Which suppliers had the most profit in warehouse sales?¶

In [47]:
fig5 = px.histogram(x=wsupplier, y=wsupplier_sales)

fig5.update_layout(
    title='Supplier Warehouse Sales (Top 20%)',
    xaxis_title='Suppliers',
    yaxis_title='Warehouse Sales'
)

fig5.show()
In [48]:
print(f'Top 5 Suppliers by Warehouse Sales: \n{wsupplier_sales_data[0:5]}')
Top 5 Suppliers by Warehouse Sales: 
SUPPLIER
CROWN IMPORTS             1651871.51
MILLER BREWING COMPANY    1425428.71
ANHEUSER BUSCH INC        1331170.84
HEINEKEN USA               829796.46
E & J GALLO WINERY         197463.78
Name: WAREHOUSE SALES, dtype: float64
In [49]:
print(f'Bottom 5 Suppliers by Warehouse Sales: \n{wsupplier_sales_data[len(wsupplier_sales_data)-5:len(wsupplier_sales_data)]}')
Bottom 5 Suppliers by Warehouse Sales: 
SUPPLIER
MEIOMI                              0.0
MERRYVALE VINEYARDS                 0.0
ZURENA LLC                          0.0
ROBERT KACHER SELECTIONS LLC       -1.0
PREMIUM DISTRIBUTORS INC       -53782.0
Name: WAREHOUSE SALES, dtype: float64

Which item type had the most profit in retail sales?¶

In [50]:
rtype_sales_data = sales_data.groupby("ITEM TYPE")["RETAIL SALES"].sum().sort_values(ascending = False)
rtype = rtype_sales_data.index
rtype_sales = rtype_sales_data.values
In [51]:
fig4 = px.bar(x=rtype, y=rtype_sales)

fig4.update_layout(
    title='Item Type Retail Sales',
    xaxis_title='Item Type',
    yaxis_title='Retail Sales'
)

fig4.show()

Which item type had the most profit in warehouse sales?¶

In [54]:
wtype_sales_data = sales_data.groupby("ITEM TYPE")["WAREHOUSE SALES"].sum().sort_values(ascending = False)
wtype = wtype_sales_data.index
wtype_sales = wtype_sales_data.values
In [55]:
fig5 = px.histogram(x=wtype, y=wtype_sales)

fig5.update_layout(
    title='Item Type Warehouse Sales',
    xaxis_title='Item Type',
    yaxis_title='Warehouse Sales'
)

fig5.show()

Which item had the most profit in retail sales? Warehouse sales?¶

In [66]:
rdesc_sales_data = sales_data.groupby("ITEM DESCRIPTION")["RETAIL SALES"].sum().sort_values(ascending = False)
wdesc_sales_data = sales_data.groupby("ITEM DESCRIPTION")["WAREHOUSE SALES"].sum().sort_values(ascending = False)

#There are a lot of Item Descriptions, difficult for visualization to handle. This will cut it down to 1% of the data
percent = .01
cutoff = int(round(percent * len(rdesc_sales_data),0))

rdesc_sales_cut = rdesc_sales_data[0:cutoff]
wdesc_sales_cut = wdesc_sales_data[0:cutoff]

#Assign 1% of item description data to rdesc
rdesc = rdesc_sales_cut.index

#Assign retail sales to redesc_sales
rdesc_sales = list(rdesc_sales_cut)

#supplier_rsales_data[supplier_rsales_data.values > 1]

wdesc = wdesc_sales_cut.index
wdesc_sales = wdesc_sales_cut.values
Out[66]:
348
In [57]:
fig5 = px.bar(x=rdesc, y=rdesc_sales)

fig5.update_layout(
    title='Item Description by Retail Sales (Top 1%)',
    xaxis_title='Item Description',
    yaxis_title='Retail Sales'
)

fig5.show()
In [58]:
print(f'Top 5 Item Descriptions by Retail Sales: \n{rdesc_sales_data[0:5]}')
Top 5 Item Descriptions by Retail Sales: 
ITEM DESCRIPTION
TITO'S HANDMADE VODKA - 1.75L    27580.5
CORONA EXTRA LOOSE NR - 12OZ     25064.0
HEINEKEN LOOSE NR - 12OZ         17761.0
MILLER LITE 30PK CAN - 12OZ      14440.0
BUD LIGHT 30PK CAN               12299.0
Name: RETAIL SALES, dtype: float64
In [59]:
print(f'Bottom 5 Item Descriptions by Retail Sales: \n{rdesc_sales_data[len(rdesc_sales_data)-5:len(rdesc_sales_data)]}')
Bottom 5 Item Descriptions by Retail Sales: 
ITEM DESCRIPTION
FULLERS ESB 6/4 NR                              0.00
WEREWOLF CAB - 750ML                           -0.03
1800 SILVER TEQUILA - W/COASTER - 750ML        -0.17
BON APPETIT MUSCAT - 750ML                     -0.18
MARCHESI DI BAROLO BAROLO      31343 - 750ML   -0.25
Name: RETAIL SALES, dtype: float64
In [60]:
fig5 = px.bar(x=wdesc, y=wdesc_sales)

fig5.update_layout(
    title='Item Description by Warehouse Sales (Top 1%)',
    xaxis_title='Item Description',
    yaxis_title='Warehouse Sales'
)

fig5.show()
In [61]:
print(f'Top 5 Item Descriptions by Warehouse Sales: \n{wdesc_sales_data[0:5]}')
Top 5 Item Descriptions by Warehouse Sales: 
ITEM DESCRIPTION
CORONA EXTRA LOOSE NR - 12OZ    303160.83
CORONA EXTRA 2/12 NR - 12OZ     247924.43
HEINEKEN LOOSE NR - 12OZ        171950.21
HEINEKEN 2/12 NR - 12OZ         154654.49
MILLER LITE 30PK CAN - 12OZ     134486.40
Name: WAREHOUSE SALES, dtype: float64
In [62]:
print(f'Bottom 5 Item Descriptions by Warehouse Sales: \n{wdesc_sales_data[len(wdesc_sales_data)-5:len(wdesc_sales_data)]}')
Bottom 5 Item Descriptions by Warehouse Sales: 
ITEM DESCRIPTION
EMPTY 1/4 KEG (30.00)       -6697.0
ISTORE TRAINING OVERVIEW    -7800.0
BEER CREDIT                -10651.0
EMPTY 1/6 KEG (30.00)      -21950.0
EMPTY 1/2 KEG (30.00)      -92660.0
Name: WAREHOUSE SALES, dtype: float64

Thank you for looking through this notebook!¶

Any questions, comments, or suggestions would be greatly appreciated!¶